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' Abstract 

If S is an infinite sequence over a finite alphabet E and (3 is a probability measure on E, then 
the dimension of S with respect to f3, written dinr 3 (,!?), is a constructive version of Billingsley 
dimension that coincides with the (constructive Hausdorff) dimension dim (S) when (3 is the 
uniform probability measure. This paper shows that dim' 5 (S) and its dual Dim' 3 (5), the strong 
dimension of S with respect to /3, can be used in conjunction with randomness to measure the 
similarity of two probability measures a and f3 on E. Specifically, we prove that the divergence 
formula 
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dim p (R) = Dim p (i?) 



H(a)+V(a\\P) 

holds whenever a and (3 are computable, positive probability measures on E and R £ E°° is 
random with respect to a. In this formula, TL(a) is the Shannon entropy of a, and T>(a\ \j3) is the 
Kullback-Leibler divergence between a and (3. We also show that the above formula holds for all 
sequences R that are a-normal (in the sense of Borel) when dim' 3 (i?) and Dim^(i?) are replaced 
by the more effective finite-state dimensions dim.p S (i?) and Dimps' 3 (-R)- In the course of proving 
this, we also prove finite-state compression characterizations of dinipg(S') and Dimps^('S'). 



1 Introduction 



• i 

The constructive dimension dim(S') and the constructive strong dimension Dim(S') of an infinite 
sequence S over a finite alphabet S are constructive versions of the two most important classical 
fractal dimensions, namely, Hausdorff dimension [9] and packing dimension \22\ I21j. respectively. 
These two constructive dimensions, which were introduced in [13|. ITj . have been shown to have the 
useful characterizations 

dim(S) = lim inf , ( L1 ) 
w^s |io|log|E| 

and 

DindS) = lim sup , ^L, , (1.2) 
w ^s Mlog|£| 

where the logarithm is base-2 |16|, fTj . In these equations, K(w) is the Kolmogorov complexity of the 
prefix w of S, i.e., the length in bits of the shortest program that prints the string w. (See section 
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2.6 or [TT] for details.) The numerators in these equations are thus the algorithmic information 
content of w, while the denominators are the "naive" information content of w, also in bits. We thus 
understand (jl.ip and (jl.2p to say that dim(S') and Dim(S') are the lower and upper information 
densities of the sequence S. These constructive dimensions and their analogs at other levels of 
effectivity have been investigated extensively in recent years |10j . 

The constructive dimensions dim(5) and Dim(S') have recently been generalized to incorporate 
a probability measure v on the sequence space S°° as a parameter |14j . Specifically, for each 
such v and each sequence S € we now have the constructive dimension dim u (S) and the 
constructive strong dimension T)im u (S) of S with respect to v. (The first of these is a constructive 
version of Billingsley dimension [2].) When v is the uniform probability measure on we have 
dim u (S) = dim(5) and Dim^(S') = Dim(S'). A more interesting example occurs when v is the 
product measure generated by a nonuniform probability measure (3 on the alphabet S. In this case, 
dim^(S') and Dim !/ (S), which we write as dim' 3 (S) and Dirn^S), are again the lower and upper 
information densities of S, but these densities are now measured with respect to unequal letter 
costs. Specifically, it was shown in |14j that 



dim^(S) = lim inf ^\ (1.3) 



,s I p (w) 
and 

Dm/(S) = limsup (1.4) 

W ^ S 2/3 (w) 

where 

M-i 1 

is the Shannon self-information of w with respect to (3. These unequal letter costs log(l//3(a)) for 
a G E can in fact be useful. For example, the complete analysis of the dimensions of individual 
points in self-similar fractals given by |14j requires these constructive dimensions with a particular 
choice of the probability measure (3 on S. 

In this paper we show how to use the constructive dimensions dim' 3 (5) and T)im^(S) in conjunc- 
tion with randomness to measure the degree to which two probability measures on S are similar. 
To see why this might be possible, we note that the inequalities 

< dim^S) < Dim^S*) < 1 

hold for all (3 and S and that the maximum values 

dim /3 (i?) = Dim /3 ( J R) = 1 (1.5) 

are achieved whenever the sequence R is random with respect to (3. It is thus reasonable to hope 
that, if R is random with respect to some other probability measure a on S, then dim^(i?) and 
Dim^(i?) will take on values whose closeness to 1 reflects the degree to which a is similar to (3. 
This is indeed the case. Our first main theorem says that the divergence formula 

tofiW = Di„A R ) = H(a) ^ ar (1-6) 
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holds whenever a and (3 are computable, positive probability measures on £ and R G S°° is random 
with respect to a. In this formula, 7i{a) is the Shannon entropy of a, and 2?(a||/3) is the Kullback- 
Leibler divergence between a and (3. When a = f3, the Kullback-Leibler divergence 2?(a||/3) is 0, so 
(|1.6j) coincides with (|1.5p . When a and (3 are dissimilar, the Kullback-Leibler divergence £>(a||/3) 
is large, so the right-hand side of (jl.6p is small. Hence the divergence formula tells us that, when 
R is a-random, dim^(i?) = Dim /3 (i?) is a quantity in [0, 1] whose closeness to 1 is an indicator of 
the similarity between a and (3. 

The proof of (|1.6|) serves as an outline of our other, more challenging task, which is to prove 
that the divergence formula (|1.6f) also holds for the much more effective finite- state (3- dimension 
dimpg(i?) and finite-state strong (3- dimension DimFs^(-R)- (These dimensions, defined in section 
12.51 are generalizations of finite-state dimension and finite-state strong dimension, which were 
introduced in OH], respectively.) 

With this objective in mind, our second main theorem characterizes the finite-state /^-dimensions 
in terms of finite-state data compression. Specifically, this theorem says that, in analogy with (jl.3p 
and (ll.4p . the identities 

dim^ s (5)=infhminf^M (1.7) 



dim^ s (S) = inflimsup^^ (1.8) 



and 

^ ^-inflimsup^M 

c w ^ s 2/3 (w) 

hold for all infinite sequences S over S. The infima here are taken over all information- lossless finite- 
state compressors (a model introduced by Shannon [20] and investigated extensively ever since) C 
with output alphabet 0, 1, and |C(u>)| denotes the number of bits that C outputs when processing 
the prefix w of S. The special cases of (|1.7|) and (|1.8p in which f3 is the uniform probability measure 
on S, and hence 1r(w) = \w\ log were proven in [6j Q]. In fact, our proof uses these special 
cases as "black boxes" from which we derive the more general (|1.7|) and (|1.8|) . 

With (jl.7p and (jl.8p in hand, we prove our third main theorem. This involves the finite-state 
version of randomness, which was introduced by Borel [3] long before finite-state automata were 
defined. If a is a probability measure on S, then a sequence S € S°° is a-normal in the sense of 
Borel if every finite string idGS* appears with asymptotic frequency a(w) £ S, where we write 

\w\-l 

a(w) = Y\ Oi(w[i\). 

i=Q 

(See section 2.6 for a precise definition of asymptotic frequency.) Our third main theorem says that 
the divergence formula 

dhn? s(J? ) = DiW(fl) = n(a) TX m < L9 > 

holds whenever a and (3 are positive probability measures on £ and R € S°° is a-normal. 

In section [2] we briefly review ideas from Shannon information theory, classical fractal dimen- 
sions, algorithmic information theory, and effective fractal dimensions that are used in this paper. 
Section [3] outlines the proofs of (jl.6p . section U] outlines the proofs of (jl.7p and (jl.8p . and section 
[5] outlines the proof of (II. 9p . Various proofs are consigned to a technical appendix. 
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2 Preliminaries 



2.1 Notation and setting 

Throughout this paper we work in a finite alphabet E = {0, 1, . . . , k — 1}, where k > 2. We write 
X* for the set of (finite) strings over E and E°° for the set of (infinite) sequences over E. We write 
|w| for the length of a string u; and A for the empty string. For w G E* and < i < |it>|, is 
the ith symbol in u;. Similarly, for S G E°° and i G N (= {0, 1, 2, . . . }), S[i] is the ith symbol in S. 
Note that the leftmost symbol in a string or sequence is the 0th symbol. 

A prefix of a string or sequence x G E* U E°° is a string w G S* for which there exists a 
string or sequence y G E* U E°° such that x = In this case we write w Q x. The equation 
lim„^s f(w) = L means that, for all e > 0, for all sufficiently long prefixes w Q S, \ f(w) — L\ < e. 
We also use the limit inferior, 

lim inf f{w) = lim inf {f{x) \ w Q x Q S} , 

w^S w—*S 

and the limit superior 

lim sup f(w) = lim sup {f(x) \ w C x C S } . 

2.2 Probability measures, gales, and Shannon information 

A probability measure on S is a function a : E — > [0, 1] such that X^aes a ( a ) = 1- A probability 
measure a on E is positive if a (a) > for every a G E. A probability measure a on E is rational if 
a(a) G Q (i.e., a(a) is a rational number) for every a G E. 

A probability measure on E°° is a function ^ : E* —>■ [0, 1] such that u(X) = 1 and, for all w G E*, 
v{w) = X^aes u { wa )- (Intuitively, v{w) is the probability that w Q S when the sequence S G E°° 
is "chosen according to za") Each probability measure a on E naturally induces the probability 
measure a on E°° defined by 

|iu|-l 

a(w) = Yl ct{w[i\) (2.1) 

i=0 

for all w G E*. 

We reserve the symbol fi for the uniform probability measure on E, i.e., 

fi(a) = — for all a G E, 

and also for the uniform probability measure on E°°, i.e., 

H(w) = for all w G S*. 

If a is a probability measure on E and s G [0, oo), then an s-a-gale is a function d : E* — > [0, oo) 
satisfying 

d( w ) = d(wa)Q(a) s (2.2) 

for all w £ S*. A 1-a-gale is also called an a-martingale. When a = fj,, we omit it from this 
terminology, so an s-^-gale is called an s-gale, and a ^-martingale is called a martingale. 
We frequently use the following simple fact without explicit citation. 
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Observation 2.1. Let a and (3 be positive probability measures on X, and let s,t € [0, oo). // 
d : S* — > [0, oo) is an s-a-gale, then the function d : S* — > [0, oo) defined by 



a(w) s 



is a t-(3-gale. 



Intuitively, an s-a-gale is a strategy for betting on the successive symbols in a sequence S € 
For each prefix uiCS, d(w) denotes the amount of capital (money) that the gale d has after betting 
on the symbols in w. If s = 1, then the right-hand side of (|2.2p is the conditional expectation of 
d(wa), given that w has occurred, so (12. 2h says that the payoffs are fair. If s < 1, then (|2.2I) says 
that the payoffs are unfair. 

Let d be a gale, and let S £ Then d succeeds on S if limsup w ^5 d{w) = oo, and d succeeds 
strongly on 5 if liminfu^s d(w) = oo. The success set of d is the set of all sequences on which 

d succeeds, and the strong success set of d is the set S^.[d] of all sequences on which d succeeds 
strongly. 

The Shannon entropy of a probability measure a on X is 

K( a ) = V] «(«) lo g 7~T' 
^— ' a(a) 

aeE v ; 

where Olog^ = 0. (unless otherwise indicated, all logarithms in this paper are base-2.) The 
Kullback-Leibler divergence between two probability measures a and (3 on £ is 



V(a\\(3) = 2^a(a)lo, 



^ a(a)log ^y 



The Kullback-Leibler divergence is used to quantify how "far apart" the two probability measures a 
and (5 are. The Shannon self-information of a string w € X* with respect to a probability measure 
/5 on E is 

1 1 i 

lfs(w) = log — — = V log ■ 

Discussions of 7i(a), T>(a\\(3), Ipiw) and their properties may be found in any good text on infor- 
mation theory, e.g., [5]. 

2.3 Hausdorff, packing, and Billingsley dimensions 

Given a probability measure (3 on S, each set X C S°° has a Hausdorff dimension dim(X), a 
packing dimension Dim(X), a Billingsley dimension dim^(X), and a strong Billingsley dimension 
Dim /3 (X), all of which are real numbers in the interval [0, 1]. In this paper we are not concerned 
with the original definitions of these classical dimensions, but rather in their recent characterizations 
(which may be taken as definitions) in terms of gales. 

Notation. For each probability measure (3 on X and each set X C E°°, let Q^{X) (respectively, 
QP' str {X)) be the set of all s G [0, oo) such that there is a /3-s-gale d satisfying X C S°°[d] (respec- 
tively, X C S£[d]). 

Theorem 2.2 (gale characterizations of classical fractal dimensions). Let (3 be a probability measure 
on E, and let X C 
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1. EMI dim(X) = inf G^{X). 3. dim^X) = inf g^(X). 

2. 0/Dim(X) = inf£^ str (X). 4. Dm/(X) = inf ^< str (X). 

2.4 Randomness and constructive dimensions 

Randomness and constructive dimensions are denned by imposing computability constraints on 
gales. 

A real-valued function / : E* — > R is computable if there is a computable, rational-valued 
function / : E* x N — > Q such that, for all w G S* and r G N, 

|/>,r)-/H| <2- r . 

A real- valued function / : X* — > R is constructive, or lower semicomputable, if there is a computable, 
rational-valued function / : E* x N -> Q such that 

(i) for all w G E* and t G N, /(«;,*) < f(w,t + 1) < and 

(ii) for all w S S*, f(w) = linit_ >00 f(w,t). 

The first successful definition of the randomness of individual sequences S G E°° was formulated 
by Martin-L6f [15]. Many characterizations (equivalent definitions) of randomness are now known, 
of which the following is the most pertinent. 

Theorem 2.3 (Schnorr [171 [18]). Let a be a probability measure on E. A sequence S G E°° is 
random with respect to a (or, briefly, a-randomj if there is no constructive a-martingale that 
succeeds on S. 

Motivated by Theorem 12.21 we now define the constructive dimensions. 

Notation. We define the sets £f onstr (X) and G^ tI (X) to be like the sets Q P (X) and £A c ° Mtr (X) 
of section 12 .3|. except that the /3-s-gales are now required to be constructive. 

Definition. Let be a probability measure on E, let X C E°°, and let S G S°°. 

1. |13j The constructive dimension of X is cdim(X) = inf ^ onstr (X). 

2. p] The constructive strong dimension of X is cDim(X) = inf ^onstr(^)- 

3. |14j The constructive (3-dimension of X is cdim^(X) = inf C/^ onstr (X). 

4. |14j The constructive strong (5- dimension of X is cDim^(X) = inf C^nstr (-^0 • 

5. [13] The dimension of 5 is dim(5) = cdimdS}). 

6. [T] The strong dimension of 5 is Dim(S') = cDim({S'}). 

7. p~4] The (3-dimension of S 1 is dim p (S) = cdim /3 ({S'}). 

8. |14j The strong (3-dimension of S 1 is Dim /3 (S') = cDim^dS 1 }). 

It is clear that definitions 1, 2, 5, and 6 above are the special case /3 = /i of definitions 3, 
4, 7, and 8, respectively. It is known that cdim^(X) = supgg^ dimP(S) and that cDim /3 (X) = 
sup^gjf Dim^(S') [14] . Constructive dimensions are thus investigated in terms of the dimensions 
of individual sequences. Since one does not discuss the classical dimension of an individual se- 
quence (because the dimensions of section [2731 are all zero for singleton, or even countable, sets), 
no confusion results from the notation dim(S'), Dim(5), dinr (>S), and Dim^(S'). 
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2.5 Normality and finite-state dimensions 

The preceding section developed the constructive dimensions as effective versions of the classical 
dimensions of section [231 We now introduce the even more effective finite-state dimensions. 



Notation. Aq(E) is the set of all rational- valued probability measure on E. 
Definition ( [191 El E] ) . A finite-state gambler (FSG) is a 4-tuple 

G = (Q,5,q ,B), 

where Q is a finite set of states, 5 : Q x E — > Q is the transition function; qo E Q is the initial state, 
and B : Q — > Aq(E) is the betting function. 

The transition structure (Q,5,qo) here works as in any deterministic finite-state automaton. 
For w G E*, we write S(w) for the state reached by starting at qo and processing w according to 5. 

Intuitively, if the above FSG is in state q G Q, then, for each a G E, it bets the fraction B(q)(a) 
of its current capital that the next input symbol is an a. The payoffs are determined as follows. 

Definition. Let G = (Q, 8, qo, B) be an FSG. 

1. The martingale of G is the function do ■ E* — > [0, oo) defined by the recursion 

<fe(A) = 1, 
da(wa) = kdG(w)B(5(w))(a) 

for all and a G E. 

2. If /3 is a probability measure on E and s G [0, oo), then the s-j3-gale of G is the function 



djjjg : E* -> [0, oo) defined by 
for all w G E*. 



It is easy to verify that (ic = is a martingale. It follows by Observation 12. 1 1 that d G L is an 
s-/3-gale. 

Definition. A finite-state s-(3-gale is an s-/9-gale of the form d G g for some FSG G. 

Notation. We define the sets Q§ S {X) and Gpf^X) to be like the sets Q^{X) and £^ str (X) of 
section 12.31 except that the s-/3-gales are now required to be finite-state. 

Definition. Let (3 be a probability measure on E, and let 5 G E°°. 

1. [6] The finite-state dimension of S is dimFs(S') = inf £/p S ({S 1 }) . 

2. [T] The finite-state strong dimension of S is DimFs(5') = inf ^pg tr ({<S'}). 

3. The finite-state (^-dimension of S is dimpg(S') = inf ({£>})■ 

4. The finite-state strong (5-dimension of S 1 is Dimps^('S') = inf ^pg tr ({S'}). 
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We now turn to some ideas based on asymptotic frequencies of strings in a given sequence. For 
nonempty strings w,x G £*, we write 



#□(«>, a;) 



\x\ 

m < - — - — 1 

\w\ 



x[m\w\..(m + l)\w\ — 1] = w 



for the number of block occurrences of w in x. For each sequence S G each positive integer n, 
and each nonempty w G E <n , the nth block frequency of to in S is 

# n (iw,5[0..n|u;| - 1]) 

KS,n{ W ) = • 

n 

Note that, for each n and I, the restriction iTg of 7r<j !n to £' is a probability measure on Yj. 
Definition. Let a be a probability measure on S, let S 1 G and let < / G N. 

1. S is a-l-normal in the sense of Borel if, for all w G X , lim TTs n (w) = a(w). 

n— >oo ' 

2. S 1 is a-normal in the sense of Borel if S is a-Z-normal for all < I G N. 

3. [3] 5" is normal in the sense of Borel if S is //-normal. 

4. S 1 has asymptotic frequency a, and we write S G FREQ Q , if S 1 is a-l-normal. 

Theorem 2.4 ([19^14]). For each probability measure a on S and eac/i S 1 G i/te following three 
conditions are equivalent. 

(1) S is a-normal. 

(2) No finite-state a-martingale succeeds on S. 

(3) dimg s (S) = 1. 

The equivalence of (1) and (2) where a = fi was proven in [19]. The equivalence of (2) and (3) 
when a = fj, was noted in [4]. The extensions of these facts to arbitrary a is routine. 

For each S G S°° and < I G N, the Ith. normalized lower and upper block entropy rates of 5 

are 

MS) = 77£^liminfW(^ 



liminfft(4°J 

llogk n^oo b ' n ' 



and 

llogk ,,_ x 



H / + ( 5 ) = 7T^T limsu P W (4°J> 



respectively. 

We use the following result in section [5j 

Theorem 2.5 Let S G 

1. dimpsiS) = inf <i 6 NH^(5'). 

2. Dim F s(S) = inf </ eN H+(S). 
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2.6 Kolmogorov complexity and finite-state compression 

We now review known characterizations of constructive and finite-state dimensions that are based 
on data compression ideas. 

The Kolmogorov complexity K.(w) of a string w G £* is the minimum length of a program 
vr G {0, 1}* for which U(ir) = w, where U is a fixed universal self-delimiting Turing machine 



Theorem 2.6. Let (3 be a probability measure on T,, and let S G S c 

J. m dim(5) = liminU^. 3. /^./ dirn^S) = hmhnw gg. 

2. W Dim(S) = hmsup^ 5 * [If D W(S) = limsup_ s . 

Definition ([20J). 1. A finite-state compressor (FSC) is a 4-tuple 

C = (Q,S,q ,u), 

where Q, 5, and qo are as in the FSG definition, and v : Q x S — > {0, 1}* is the output 
function. 

2. The output of an FSC C = (Q, 5, qo, v) on an input w G X* is the string C{w) G {0,1}* 
defined by the recursion 

C(A) = A, 
C{wa) = C(w)u(5(w),a) 

for all w G X* and a G X. 

3. An information-lossless finite-state compressor (ILFSC) is an FSC for which the function 

S* -» {0,1}* x Q 
u; i— > (C(w), S(w)) 

is one-to-one. 
Theorem 2.7. Lei 5 G X°°. 

|g(w)| 



L JgJ/ dim FS (S') = infc hmhhV+s ^\ ^ k . 

\C(w)\ 
w^S \w\logk' 



2. £7] Dim FS (5) = ini r limsup,, - 



3 Divergence formula for randomness and constructive dimen- 
sions 

This section proves the divergence formula for a-randomness, constructive /3-dimension, and con- 
structive strong /3-dimension. The key point here is that the Kolmogorov complexity characteriza- 
tions of these /3-dimensions reviewed in section [2761 immediately imply the following fact. 
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Lemma 3.1. If a and (3 are computable, positive probability measure on S, then, for all S G TP° , 

. T a (w) din/(S) l a (w) 
hm mi —— — - < - — < lim sup — - — - , 

w^s lp{w) dim (S) w ^ s Zf3{w) 

and 

»-S Z„W - Dim°(S) - 
The following lemma is crucial to our argument, both here and in section 

Lemma 3.2 (frequency divergence lemma). If a and (3 are positive probability measures on S ; 
then, for all S G FREQ a , 

Ip{w) = (H{a)+V{a\\p))\w\+o{\w\) 

as w — > 5. 

The next lemma gives a simple relationship between the constructive /3-dimension and the 
constructive dimension of any sequence that is a-l-normal. 

Lemma 3.3. If a and f3 are computable, positive probability measures on S, then, for all S G 
FREQ", 

dim* (S ) = - - y i> 

and 

Di„/(S) = °Hf> 
H k {a) + V k (a\ 

We now recall the following constructive strengthening of a 1949 theorem of Eggleston [TJ. 

Theorem 3.4 ( [13|, [T]). If a is a computable probability measure on S, i/ien, /or every a-random 
sequence R G T,°° , 

dim(i?) = Dim(i?) = H k {a). 
The main result of this section is now clear. 

Theorem 3.5 (divergence formula for randomness and constructive dimensions). If a and (5 are 

computable, positive probability measures on S, then, for every a-random sequence R G 

dm/(#) = Dim^i?) = — . H ^. — - . 

Proof. This follows immediately from Lemma 13.31 and Theorem 13.41 □ 
We note that D(a||/i) = log A; — H(a), so Theorem 13.41 is the case (3 = \i of Theorem 13.51 
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4 Finite-state dimensions and data compression 

This section proves finite-state compression characterizations of finite-state /3-dimension and finite- 
state strong /3-dimension that are analogous to the characterizations given by parts 3 and 4 of 
Theorem 12.61 Our argument uses the following two technical lemmas, which are proven in the 
technical appendix. 

Lemma 4.1. Let (3 be a positive probability measure on S, and let C be an ILFSC. Assume that 
/CE*, s > 0, and e > have the property that, for all w £ I, 

. \C(w)\ 

s > ~ 7 ; +e. 4.1) 
Then there exist an FSG G and a real number 5 > such that, for all sufficiently long strings 

w el, 

4^H>2 5H . (4.2) 

Lemma 4.2. Let (3 be a positive probability measure on T,, and let G be an FSG. Assume that 
ICS*, s > 0, and e > have the property that, for all w £ I, 

4^ 2£) H>1. (4.3) 
Then there is an ILFSC C such that, for all w £ I, 

\C(w)\<slfs(w). (4.4) 
We now prove the main result of this section. 

Theorem 4.3 (compression characterizations of finite-state /3-dimensions) . If (3 is a positive prob- 
ability measure on S, then, for each sequence S S 

diml(S) = inf liminf £^v, (4.5) 



and 



Dimps^S) = mf limsup (4.6) 

c w ^ s Z(3{w) 



where the infima are taken over all ILFCSs C . 

Proof. Let (3 and S be as given. We first prove that the left-hand sides of f|4.5|) and (|4.6p do not 
exceed the right-hand sides. For this, let C be an ILFSC. It suffices to show that 

dimL(S) < liminf (4.7) 

and 

DiniFs^S) < limsup £M (4.8) 

To see that (|4.7p holds, let s exceed the right-hand side. Then there exist an infinite set I of 
prefixes of S and an e > such that (|4.ip holds for all w € I. It follows by Lemma 14.11 that there 
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exist an FSG G and a 5 > such that, for all sufficiently long w G 7, cig^(u') > 2 l5 ' w L Since 7 is 
infinite and 5 > 0, this implies that S 1 G S ,0O [dj4 J, whence dinipg(S') < s. This establishes (|4.7p . 



The proof that (|4.8p holds is identical to the preceding paragraph, except that 7 is now a cofinite 
set of prefixes of S, so S G ^£[<Zg g]. 

It remains to be shown that the right-hand sides of (14. 5|) and (|4.6f) do not exceed the left-hand 
sides. To see this for (I4.5p . let s > dim FS (S). It suffices to show that there is an ILFSC C such 
that 

liminfM^l< s . (4.9) 

W-+S Z{3{W) 

By our choice of s there exists e > such that s — 2e > dimpg(S'). This implies that there is an 
infinite set 7 of prefixes of S such that (|4.3p holds for all w G 7. Choose C for G, 7, S 1 , and e as in 
Lemma 14.21 Then 

hminf^M<inf^M< S (4.10) 

w-*S 1p{w) w£l X/3(W) 

by ([Q|) . so (gj]) holds. 

The proof that the right-hand side of (|4.6p does not exceed the left-hand side is identical to the 
preceding paragraph, except that the limits inferior in (|4.9p and (|4.10p are now limits superior, and 
the set 7 is now a cofinite set of prefixes of S. □ 

5 Divergence formula for normality and finite-state dimensions 

This section proves the divergence formula for a-normality, finite-state /3-dimension, and finite- 
state strong /3-dimension. As should now be clear, Theorem 14.31 enables us to proceed in analogy 
with section O 

Lemma 5.1. If a and (3 are positive probability measures on E, then, for all S G 

limif |M<^^< limsup |M, (5.1) 
w-*s lp(w) dim FS (6) w ^ s 2/3 (u>) 

and 

l a (w) Dim F /(S) X a (w) 

hmmf — — < — — — < hmsup ^-r-. (5.2) 

w^s lp(w) Dim FS (<S) w ^s -L(3(w) 

Lemma 5.2. If a and j3 are positive probability measures on S, then, for all S G FREQ a ; 

din&(S) = - - 
FSV H k {a)+V k {a 

_ Dim FS (5) 



and 

Dim FS "(5) , , 

We next prove a finite-state analog of Theorem 13. 4[ 
Theorem 5.3. If a is a probability measure on S, then, for every a-normal sequence R G 

dim FS (i?) = Dim FS (i?) = 74(a). 
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We now have our third main theorem. 

Theorem 5.4 (divergence theorem for normality and finite-state dimensions). If a and (3 are 

positive probability measures on S, then, for every a-normal sequence R € 

dim^ s ( J R) = Dim F s^(it!) = — ^ . 

Tt{a) + V(a\\p) 

Proof. This follows immediately from Lemma 15.21 and Theorem 15.31 □ 
We again note that T>(a\ \(3) = log k — 7i{a), so Theorem 15.31 is the case (3 = fj, of Theorem 15.41 



Acknowledgments. I thank Xiaoyang Gu and Elvira Mayordomo for useful discussions. 
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A Appendix — Various Proofs 

Proof of Lemma 13.21 Assume the hypothesis, and let S € FREQ a . Then, as w — > S, we have 

M-i 

x p( w ) = Y\ lo § at r-n 

= £#(«,«) log ^ 



\w 



gfreq a Hlog^ 



= M ( log — ^ + a(a) log ^) + o(\w\) 
= (H(a)+V(a\\P))\w\+o(\w\). 

□ 

Proof of Lemma 13.31 Let a, (3, and S be as given. By the frequency divergence lemma, we have 

Ifj,(w) \w\logk 
lp{w) = {H{a)+V{a\\f5))\w\+o{\w\) 
log k 

" W(a)+P(a||/?)+o(l) 

log fc 

W(a) + X>(a||/J) + ° l 1 



as w — > 5. The present lemma follows from this and Lemma 13.11 □ 
The following lemma summarizes the first part of the proof of Theorem 12.71 



Lemma A.l Q6J). For each ILFSC C there is an integer m £ Z + such that, for each I € Z + , i/iere 
zs an FSG G such that, for all idGE*, 

log (to) > HlogJfe- \C(w)\ -m(^ + l). (A.l) 
Proof of Lemma 14.11 Assume the hypothesis. Let 

5 P = minlog— -, 

a££ fj{a) 

noting the following two things. 
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(i) 5/3 > 0, because is positive. 

(ii) For all w £ £*, 

Choose m for C as in Lemma lA.H let 



lfs(w) > 6p\w\. 



3m 



eS(3 



and choose G for C, m, and I as in Lemma l4.1i Let 



5 = ~e6p, 



noting that 5 > and that 



\w\ > I 2 



e5p\w\ — m(^p + I) 
= e5p\w\-f(\w\+l 2 ) 



> e6p\w\ - *f\w\ 
= (e5p- 2 f)\w\ 



st5p\w\, 



> 



AM 2 



i.e., that 



| to | > I 2 => | u; | — m{^Y + > <5|u>|. 
It follows that, for all w £ / with \w\ > I 2 , we have 

io g 4 s > ) /3 H=iog(ig^dg ) H) 

= — |w| log k + sZp{w) + logd^(u;) 
s^H - |CH| - + 



efyM -m(^f + 1) 
^ <5|H- 



> 



Hence P~2l) holds. 



(A.2) 
(A.3) 



(A.4) 



□ 

An FSG G = (Q, S, <5, /?, go) is nonvanishing if all its bets are nonzero, i.e., if (3{q){a) > holds 
for all q £ Q an d a G X. 

Lemma A.2 ([6]). For each FSG G and each 5 > 0, there is a nonvanishing FSG G' such that, 
for all w 6 X* , 

<$(w)>k- s Md$(w). (A.5) 



The following lemma summarizes the second part of the proof of Theorem [2? 

Lemma A.3 ([£]). For each nonvanishing FSG G and each I G Z + , there exists an ILFSC C such 
that, for all io£E*, 

\C(w)\ < (l + DMlogfc-logdgV). (A.6) 
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Proof of Lemma 14.21 Assume the hypothesis. Let 

7 = log 



where 

/?max = max /3(a). 
Note that 7 > (because (3 is positive) and that, for all weS*, 

1fs{w) > j\w\. (A.7) 

Let 



5 = 16 
log k 

and choose G' for G and 5 as in Lemma lA.21 Let 

l _ plogife" 

7e 

and choose C for G' and I as in Lemma lA.31 Then, for all w E I, 



\C(w)\ <^ {1 + f)\w\\og k- log d$(w) 
<^ \w\(~fe + log A;) - logd$(u;) 
<E3 \ w \( ie + \ogk)-log{k' s ^d^\w)) 
= \w\(fye + log A: + Slogk) — log (it;) 
= \w\(2'ye + log A;) — logdff(w) 

s-2e 



\w\(2je + logfc) - log ( / A d ( Gp J (w) 



/3H-^ j(s _ 2e)/ 

Hiw) 



<^ Iw^e + log^-logf^ 



= |«;|(27e + log jfe) - log(£; H ^H s ~ 2e ) 
= 2 7 e|u;j -log/3(u;) s - 2e 
= 2 7 eH + (s - 2e)Tp(w) 

8lf}{w). 



(A.8) 



(A.9) 



□ 



Proof of Lemma 15.21 As in the proof of Lemma 13.31 the hypothesis implies that 

1 m 

Tp{w) H k {a) + V k {a\\(3) ^ K > 

as w — > S. The present lemma follows from this and Lemma 15. 11 □ 
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Proof of Theorem 15.31 Assume the hypothesis, and let I £ Z + . Let oft' be the restriction of the 
product probability measure /j, a to noting that H(a^) = lH(a). We first show that 

lim H(ir$ n ) = H(a®), (A.10) 

where 7r^ n is the empirical probability measure defined in section 12,51 To see this, let e > 0. By 
the continuity of the entropy function, there is a real number 5 > such that, for all probability 
measures tt on S', 

max \tt(w) - a®(w)\ < 5 \HM - H(a m )\ < e. 
Since R is a-normal, there is, for each w € E z , a positive integer such that, for all n > n w , 

k§» - = k£» - m»I < *■ 

Let iV = max weS i n w . Then, for all n > N, we have |H(7rj^ n ) — H(a^)\ < e, confirming (|A.10|) . 
By Theorem 12.51 we now have 

dimFs(-R) = DimFs(-R) 

= inf T7^-V lim H^Rr,) 
iez+ l\ogkn->oo K R > nJ 

= inf _L-W( a «) 

zez+ t log A; 

_ H(a) 



log 

W*(a). 



□ 
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